Picture for Yutaka Matsuo

Yutaka Matsuo

Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying

Add code
May 29, 2026
Viaarxiv icon

Zipping the Thought: When and How Compressed Reasoning Data Works in LLM Post-Training

Add code
May 27, 2026
Viaarxiv icon

JMed48k: A Multi-Profession Japanese Medical Licensing Benchmark for Vision-Language Model Evaluation

Add code
May 21, 2026
Viaarxiv icon

E3VS-Bench: A Benchmark for Viewpoint-Dependent Active Perception in 3D Gaussian Splatting Scenes

Add code
Apr 20, 2026
Viaarxiv icon

Does "Do Differentiable Simulators Give Better Policy Gradients?'' Give Better Policy Gradients?

Add code
Apr 20, 2026
Viaarxiv icon

C-voting: Confidence-Based Test-Time Voting without Explicit Energy Functions

Add code
Apr 15, 2026
Viaarxiv icon

CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI Automation

Add code
Apr 10, 2026
Viaarxiv icon

Thinking While Listening: Fast-Slow Recurrence for Long-Horizon Sequential Modeling

Add code
Apr 02, 2026
Viaarxiv icon

EC-Bench: Enumeration and Counting Benchmark for Ultra-Long Videos

Add code
Mar 31, 2026
Viaarxiv icon

Semantic Token Clustering for Efficient Uncertainty Quantification in Large Language Models

Add code
Mar 20, 2026
Viaarxiv icon